Collaborative Spam Filtering with the Hashing Trick
نویسندگان
چکیده
User feedback is vital to the quality of the collaborative spam fi lters frequently used in open membership email systems such as Yahoo Mail or Gmail. Users occasionally designate emails as spam or non-spam (often termed as ham), and these labels are subsequently used to train the spam fi lter. Although the majority of users provide very little data, as a collective the amount of training data is very large (many millions of emails per day). Unfortunately, there is substantial deviation in users’ notions of what constitutes spam and ham. Additionally, the open membership policy of these systems makes it vulnerable to users with malicious intent – spammers who wish to see their emails accepted by any spam fi ltration system can create accounts and use these to give malicious feedback to ‘train’ the spam fi lter in giving their emails a free pass. When combined, these realities make it extremely diffi cult to assemble a single, global spam classifi er.
منابع مشابه
Artificial Immune System for Collaborative Spam Filtering
Artificial immune systems (AIS) use the concepts and algorithms inspired by the theory of how the human immune system works. This document presents the design and initial evaluation of a new artificial immune system for collaborative spam filtering. Collaborative spam filtering allows for the detection of not-previously-seen spam content, by exploiting its bulkiness. Our system uses two novel a...
متن کاملSpam Filtering Based On The Analysis Of Text Information Embedded Into Images
In recent years anti-spam filters have become necessary tools for Internet service providers to face up to the continuously growing spam phenomenon. Current server-side anti-spam filters are made up of several modules aimed at detecting different features of spam e-mails. In particular, text categorisation techniques have been investigated by researchers for the design of modules for the analys...
متن کاملTowards Symbiotic Spam E-mail Filtering
This position paper discusses the use of symbiotic filtering, a novel distributed data mining approach that combines contentbased and collaborative filtering for spam detection.
متن کاملPersonalised, Collaborative Spam Filtering
The state of the art sees content-based filters tending towards collaborative filters, whereby email is filtered at the MTA with users feeding information back about false positives and negatives. While this improves the ability of the filter to track concept drift in spam over time, such approaches make assumptions implicit in centralised spam filtering, such as that all users consider the sam...
متن کاملCollaborative Blog Spam Filtering Using Adaptive Percolation Search
We propose a novel collaborative filtering method for link spams on blogs. The key idea is to rely on manual identification of spams and share this information about spams through a network of trust. The blogger who has identified a spam tells a small number of fellow bloggers (content implantation), and those who have not heard about it start a search using an adaptive percolation search, comb...
متن کامل